{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_label: Ollama\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ChatOllama\n",
    "\n",
    "[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n",
    "\n",
    "Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n",
    "\n",
    "It optimizes setup and configuration details, including GPU usage.\n",
    "\n",
    "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
    "\n",
    "## Setup\n",
    "\n",
    "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
    "\n",
    "* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n",
    "* Fetch available LLM model via `ollama pull <name-of-model>`\n",
    "    * View a list of available models via the [model library](https://ollama.ai/library)\n",
    "    * e.g., `ollama pull llama3`\n",
    "* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
    "\n",
    "> On Mac, the models will be download to `~/.ollama/models`\n",
    "> \n",
    "> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
    "\n",
    "* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
    "* To view all pulled models, use `ollama list`\n",
    "* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
    "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
    "\n",
    "## Usage\n",
    "\n",
    "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
    "\n",
    "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n",
    "\n",
    "This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n",
    "\n",
    "## Interacting with Models \n",
    "\n",
    "Here are a few ways to interact with pulled local models\n",
    "\n",
    "#### In the terminal:\n",
    "\n",
    "* All of your local models are automatically served on `localhost:11434`\n",
    "* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
    "\n",
    "#### Via an API\n",
    "\n",
    "Send an `application/json` request to the API endpoint of Ollama to interact.\n",
    "\n",
    "```bash\n",
    "curl http://localhost:11434/api/generate -d '{\n",
    "  \"model\": \"llama3\",\n",
    "  \"prompt\":\"Why is the sky blue?\"\n",
    "}'\n",
    "```\n",
    "\n",
    "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
    "\n",
    "#### Via LangChain\n",
    "\n",
    "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n",
    "\n",
    "View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Why did the astronaut break up with his girlfriend?\n",
      "\n",
      "Because he needed space!\n"
     ]
    }
   ],
   "source": [
    "# LangChain supports many other chat models. Here, we're using Ollama\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "# supports many more optional parameters. Hover on your `ChatOllama(...)`\n",
    "# class to view the latest available supported parameters\n",
    "llm = ChatOllama(model=\"llama3\")\n",
    "prompt = ChatPromptTemplate.from_template(\"Tell me a short joke about {topic}\")\n",
    "\n",
    "# using LangChain Expressive Language chain syntax\n",
    "# learn more about the LCEL on\n",
    "# /docs/concepts/#langchain-expression-language-lcel\n",
    "chain = prompt | llm | StrOutputParser()\n",
    "\n",
    "# for brevity, response is printed in terminal\n",
    "# You can use LangServe to deploy your application for\n",
    "# production\n",
    "print(chain.invoke({\"topic\": \"Space travel\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LCEL chains, out of the box, provide extra functionalities, such as streaming of responses, and async support"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Why\n",
      " did\n",
      " the\n",
      " astronaut\n",
      " break\n",
      " up\n",
      " with\n",
      " his\n",
      " girlfriend\n",
      " before\n",
      " going\n",
      " to\n",
      " Mars\n",
      "?\n",
      "\n",
      "\n",
      "Because\n",
      " he\n",
      " needed\n",
      " space\n",
      "!\n",
      "\n"
     ]
    }
   ],
   "source": [
    "topic = {\"topic\": \"Space travel\"}\n",
    "\n",
    "for chunks in chain.stream(topic):\n",
    "    print(chunks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For streaming async support, here's an example - all possible via the single chain created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "topic = {\"topic\": \"Space travel\"}\n",
    "\n",
    "async for chunks in chain.astream(topic):\n",
    "    print(chunks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the [LangChain Expressive Language (LCEL) Interface](/docs/concepts#interface) for the other available interfaces for use when a chain is created.\n",
    "\n",
    "## Building from source\n",
    "\n",
    "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extraction\n",
    " \n",
    "Use the latest version of Ollama and supply the [`format`](https://github.com/jmorganca/ollama/blob/main/docs/api.md#json-mode) flag. The `format` flag will force the model to produce the response in JSON.\n",
    "\n",
    "> **Note:** You can also try out the experimental [OllamaFunctions](/docs/integrations/chat/ollama_functions) wrapper for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.chat_models import ChatOllama\n",
    "\n",
    "llm = ChatOllama(model=\"llama3\", format=\"json\", temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "content='{ \"morning\": \"blue\", \"noon\": \"clear blue\", \"afternoon\": \"hazy yellow\", \"evening\": \"orange-red\" }\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n ' id='run-e893700f-e2d0-4df8-ad86-17525dcee318-0'\n"
     ]
    }
   ],
   "source": [
    "from langchain_core.messages import HumanMessage\n",
    "\n",
    "messages = [\n",
    "    HumanMessage(\n",
    "        content=\"What color is the sky at different times of the day? Respond using JSON\"\n",
    "    )\n",
    "]\n",
    "\n",
    "chat_model_response = llm.invoke(messages)\n",
    "print(chat_model_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Name: John\n",
      "Age: 35\n",
      "Likes: Pizza\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.messages import HumanMessage\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "json_schema = {\n",
    "    \"title\": \"Person\",\n",
    "    \"description\": \"Identifying information about a person.\",\n",
    "    \"type\": \"object\",\n",
    "    \"properties\": {\n",
    "        \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
    "        \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
    "        \"fav_food\": {\n",
    "            \"title\": \"Fav Food\",\n",
    "            \"description\": \"The person's favorite food\",\n",
    "            \"type\": \"string\",\n",
    "        },\n",
    "    },\n",
    "    \"required\": [\"name\", \"age\"],\n",
    "}\n",
    "\n",
    "llm = ChatOllama(model=\"llama2\")\n",
    "\n",
    "messages = [\n",
    "    HumanMessage(\n",
    "        content=\"Please tell me about a person using the following JSON schema:\"\n",
    "    ),\n",
    "    HumanMessage(content=\"{dumps}\"),\n",
    "    HumanMessage(\n",
    "        content=\"Now, considering the schema, tell me about a person named John who is 35 years old and loves pizza.\"\n",
    "    ),\n",
    "]\n",
    "\n",
    "prompt = ChatPromptTemplate.from_messages(messages)\n",
    "dumps = json.dumps(json_schema, indent=2)\n",
    "\n",
    "chain = prompt | llm | StrOutputParser()\n",
    "\n",
    "print(chain.invoke({\"dumps\": dumps}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-modal\n",
    "\n",
    "Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.ai/library/bakllava) and [llava](https://ollama.ai/library/llava).\n",
    "\n",
    "Browse the full set of versions for models with `tags`, such as [Llava](https://ollama.ai/library/llava/tags).\n",
    "\n",
    "Download the desired LLM via `ollama pull bakllava`\n",
    "\n",
    "Be sure to update Ollama so that you have the most recent version to support multi-modal.\n",
    "\n",
    "Check out the typical example of how to use ChatOllama multi-modal support below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "!pip install --upgrade --quiet  pillow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\" />"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import base64\n",
    "from io import BytesIO\n",
    "\n",
    "from IPython.display import HTML, display\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "def convert_to_base64(pil_image):\n",
    "    \"\"\"\n",
    "    Convert PIL images to Base64 encoded strings\n",
    "\n",
    "    :param pil_image: PIL image\n",
    "    :return: Re-sized Base64 string\n",
    "    \"\"\"\n",
    "\n",
    "    buffered = BytesIO()\n",
    "    pil_image.save(buffered, format=\"JPEG\")  # You can change the format if needed\n",
    "    img_str = base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
    "    return img_str\n",
    "\n",
    "\n",
    "def plt_img_base64(img_base64):\n",
    "    \"\"\"\n",
    "    Disply base64 encoded string as image\n",
    "\n",
    "    :param img_base64:  Base64 string\n",
    "    \"\"\"\n",
    "    # Create an HTML img tag with the base64 string as the source\n",
    "    image_html = f'<img src=\"data:image/jpeg;base64,{img_base64}\" />'\n",
    "    # Display the image by rendering the HTML\n",
    "    display(HTML(image_html))\n",
    "\n",
    "\n",
    "file_path = \"../../../static/img/ollama_example_img.jpg\"\n",
    "pil_image = Image.open(file_path)\n",
    "\n",
    "image_b64 = convert_to_base64(pil_image)\n",
    "plt_img_base64(image_b64)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "90%\n"
     ]
    }
   ],
   "source": [
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.messages import HumanMessage\n",
    "\n",
    "llm = ChatOllama(model=\"bakllava\", temperature=0)\n",
    "\n",
    "\n",
    "def prompt_func(data):\n",
    "    text = data[\"text\"]\n",
    "    image = data[\"image\"]\n",
    "\n",
    "    image_part = {\n",
    "        \"type\": \"image_url\",\n",
    "        \"image_url\": f\"data:image/jpeg;base64,{image}\",\n",
    "    }\n",
    "\n",
    "    content_parts = []\n",
    "\n",
    "    text_part = {\"type\": \"text\", \"text\": text}\n",
    "\n",
    "    content_parts.append(image_part)\n",
    "    content_parts.append(text_part)\n",
    "\n",
    "    return [HumanMessage(content=content_parts)]\n",
    "\n",
    "\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "chain = prompt_func | llm | StrOutputParser()\n",
    "\n",
    "query_chain = chain.invoke(\n",
    "    {\"text\": \"What is the Dollar-based gross retention rate?\", \"image\": image_b64}\n",
    ")\n",
    "\n",
    "print(query_chain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concurrency Features\n",
    "\n",
    "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
    "\n",
    "Start the Ollama server with:\n",
    "\n",
    "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
    "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
    "\n",
    "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
    "\n",
    "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
